Application of HITS algorithm to detect terms and sentences with high saliency scores
نویسندگان
چکیده
We adopt the HITS algorithm and assess the performance of its mutual reinforcement principle to detect terms and sentences with high saliency scores in documents. An overview is given of the goal, possible applications and the implementation. The weighting scheme that has been used, is described and accounted for. Experimental results are given as well as ideas for further research. 1 Goal and application The goal is to extract terms and sentences that are salient for the content of a document. Document summarization, which is a very active research area, is the most important possible application. Given the present explosion in the amount of available information, summarization becomes more and more important e.g. for presenting retrieval results, identifying new information, etc. It should be noted that the technique described here only extracts terms and sentences and should be complemented with other techniques. 2 Weighting scheme and HITS algorithm In this section we elaborate on the HITS algorithm and the weighting scheme that has been used. The HITS algorithm, first described by Kleinberg [1], was originally used for the detection of high scoring hub and authority web pages using a mutual reinforcement principle between them. This principle states that a web page is a good authority if it is pointed to by many good hubs and that a good hub page points to many good authorities. A graph of nodes representing web pages and directed edges between them (representing hyperlinks) is constructed and each node gets an authority score and a hub score. For the detection of terms and sentences with high saliency scores, we construct a directed acyclic graph. All sentences of a document are represented by ’sentence nodes’ that have outgoing directed edges to ’term nodes’ for each term they contain. These edges can carry two weights, one of them used in ’forward sense’ of the edge (downstream the arrow) and one in ’backward sense’ (upstream). The HITS algorithm is adapted a bit to incorporate them in the calculations. The edge weights indicate the frequency of the terms in the corresponding sentence, optionally multiplied by weight factors described below. The mutual reinforcement principle can then be reformulated as: ”A salient term is a term that occurs in a lot of salient sentences and a salient sentence is a sentence that contains a lot of salient terms”. After removing any HTML tags from the documents, the sentences they contain are preprocessed. First of all, depending on the choice of the user, the stopwords can be removed as well as all nonalphanumeric characters (except hyphens). Also all isolated numbers are considered as bearing not much content and are removed. The remaining terms are then stemmed using the Porter stemmer
منابع مشابه
Salient regions detection in satellite images using the combination of MSER local features detector and saliency models
Nowadays, due to quality development of satellite images, automatic target detection on these images has been attracted many researchers' attention. Remote-sensing images follow various geospatial targets; these targets are generally man-made and have a distinctive structure from their surrounding areas. Different methods have been developed for automatic target detection. In most of these met...
متن کاملA Novel Approach to Background Subtraction Using Visual Saliency Map
Generally human vision system searches for salient regions and movements in video scenes to lessen the search space and effort. Using visual saliency map for modelling gives important information for understanding in many applications. In this paper we present a simple method with low computation load using visual saliency map for background subtraction in video stream. The proposed technique i...
متن کاملReduced-Reference Image Quality Assessment based on saliency region extraction
In this paper, a novel saliency theory based RR-IQA metric is introduced. As the human visual system is sensitive to the salient region, evaluating the image quality based on the salient region could increase the accuracy of the algorithm. In order to extract the salient regions, we use blob decomposition (BD) tool as a texture component descriptor. A new method for blob decomposition is propos...
متن کاملA Saliency Detection Model via Fusing Extracted Low-level and High-level Features from an Image
Saliency regions attract more human’s attention than other regions in an image. Low- level and high-level features are utilized in saliency region detection. Low-level features contain primitive information such as color or texture while high-level features usually consider visual systems. Recently, some salient region detection methods have been proposed based on only low-level features or hig...
متن کاملExtending the Radar Dynamic Range using Adaptive Pulse Compression
The matched filter in the radar receiver is only adapted to the transmitted signal version and its output will be wasted due to non-matching with the received signal from the environment. The sidelobes amplitude of the matched filter output in pulse compression radars are dependent on the transmitted coded waveforms that extended as much as the length of the code on both sides of the target loc...
متن کامل